general game
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
CogniPlay: a work-in-progress Human-like model for General Game Playing
Rautureau, Aloïs, Piette, Éric
--While AI systems have equaled or surpassed human performance in a wide variety of games such as Chess, Go, or Dota 2, describing these systems as truly "human-like" remains far-fetched. Despite their success, they fail to replicate the pattern-based, intuitive decision-making processes observed in human cognition. This paper presents an overview of findings from cognitive psychology and previous efforts to model humanlike behavior in artificial agents, discusses their applicability to General Game Playing (GGP) and introduces our work-in-progress model based on these observations: CogniPlay. Although AI systems have surpassed human performance in games such as Chess [5], Go [14], and competitive games like Dota 2 [2], describing them as "human-like" would be an overstatement. Despite their exceptional performance, these systems fail to accurately replicate the selective, pattern-based decision-making that characterizes human cognition [8], [12].
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Europe > Belgium > Wallonia > Walloon Brabant > Louvain-la-Neuve (0.04)
- Overview (0.54)
- Research Report (0.40)
- Leisure & Entertainment > Games > Chess (0.72)
- Leisure & Entertainment > Games > Computer Games (0.69)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.98)
Best Agent Identification for General Game Playing
Stephenson, Matthew, Newcombe, Alex, Piette, Eric, Soemers, Dennis
We present an efficient and generalised procedure to accurately identify the best performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on the Wilson score interval (Optimistic-WS) that ranks each arm across all bandits in terms of their potential regret reduction. We evaluate the performance of Optimistic-WS on two of the most popular general game domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of identifying the highest performing agent for each game within a limited number of trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
- Information Technology > Data Science > Data Mining > Big Data (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Soleymani, Ashkan, Piliouras, Georgios, Farina, Gabriele
We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.
Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability
Collina, Natalie, Gupta, Varun, Roth, Aaron
We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
Fast and Knowledge-Free Deep Learning for General Game Playing (Student Abstract)
Maras, Michał, Kępa, Michał, Kowalski, Jakub, Szykuła, Marek
We develop a method of adapting the AlphaZero model to General Game Playing (GGP) that focuses on faster model generation and requires less knowledge to be extracted from the game rules. The dataset generation uses MCTS playing instead of self-play; only the value network is used, and attention layers replace the convolutional ones. This allows us to abandon any assumptions about the action space and board topology. We implement the method within the Regular Boardgames GGP system and show that we can build models outperforming the UCT baseline for most games efficiently.
Hybrid Minimax-MCTS and Difficulty Adjustment for General Game Playing
Vieira, Marco Antônio Athayde de Aguiar, Tavares, Anderson Rocha, Ribas, Renato Perez
Board games are a great source of entertainment for all ages, as they create a competitive and engaging environment, as well as stimulating learning and strategic thinking. It is common for digital versions of board games, as any other type of digital games, to offer the option to select the difficulty of the game. This is usually done by customizing the search parameters of the AI algorithm. However, this approach cannot be extended to General Game Playing agents, as different games might require different parametrization for each difficulty level. In this paper, we present a general approach to implement an artificial intelligence opponent with difficulty levels for zero-sum games, together with a propose of a Minimax-MCTS hybrid algorithm, which combines the minimax search process with GGP aspects of MCTS. This approach was tested in our mobile application LoBoGames, an extensible board games platform, that is intended to have an broad catalog of games, with an emphasis on accessibility: the platform is friendly to visually-impaired users, and is compatible with more than 92\% of Android devices. The tests in this work indicate that both the hybrid Minimax-MCTS and the new difficulty adjustment system are promising GGP approaches that could be expanded in future work.
Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU
Piliouras, Georgios, Sim, Ryann, Skoulakis, Stratis
In this paper we solve the problem of no-regret learning in general games. Specifically, we provide a simple and practical algorithm that achieves constant regret with fixed step-sizes. The cumulative regret of our algorithm provably decreases linearly as the step-size increases. Our findings depart from the prevailing paradigm that vanishing step-sizes are a prerequisite for low regret as championed by all state-of-the-art methods to date. We shift away from this paradigm by defining a novel algorithm that we call Clairvoyant Multiplicative Weights Updates (CMWU). CMWU is Multiplicative Weights Updates (MWU) equipped with a mental model (jointly shared across all agents) about the state of the system in its next period. Each agent records its mixed strategy, i.e., its belief about what it expects to play in the next period, in this shared mental model which is internally updated using MWU without any changes to the real-world behavior up until it equilibrates, thus marking its consistency with the next day's real-world outcome. It is then and only then that agents take action in the real-world, effectively doing so with the "full knowledge" of the state of the system on the next day, i.e., they are clairvoyant. CMWU effectively acts as MWU with one day look-ahead, achieving bounded regret. At a technical level, we establish that self-consistent mental models exist for any choice of step-sizes and provide bounds on the step-size under which their uniqueness and linear-time computation are guaranteed via contraction mapping arguments. Our arguments extend well beyond normal-form games with little effort.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
General Board Game Concepts
Piette, Éric, Stephenson, Matthew, Soemers, Dennis J. N. J., Browne, Cameron
Many games often share common ideas or aspects between them, such as their rules, controls, or playing area. However, in the context of General Game Playing (GGP) for board games, this area remains under-explored. We propose to formalise the notion of "game concept", inspired by terms generally used by game players and designers. Through the Ludii General Game System, we describe concepts for several levels of abstraction, such as the game itself, the moves played, or the states reached. This new GGP feature associated with the ludeme representation of games opens many new lines of research. The creation of a hyper-agent selector, the transfer of AI learning between games, or explaining AI techniques using game terms, can all be facilitated by the use of game concepts. Other applications which can benefit from game concepts are also discussed, such as the generation of plausible reconstructed rules for incomplete ancient games, or the implementation of a board game recommender system.
- Europe > Netherlands > Limburg > Maastricht (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
- (6 more...)
Inducing game rules from varying quality game play
General Game Playing (GGP) is a framework in which an artificial intelligence program is required to play a variety of games successfully. It acts as a test bed for AI and motivator of research. The AI is given a random game description at runtime which it then plays. The framework includes repositories of game rules. The Inductive General Game Playing (IGGP) problem challenges machine learning systems to learn these GGP game rules by watching the game being played. In other words, IGGP is the problem of inducing general game rules from specific game observations. Inductive Logic Programming (ILP) has shown to be a promising approach to this problem though it has been demonstrated that it is still a hard problem for ILP systems. Existing work on IGGP has always assumed that the game player being observed makes random moves. This is not representative of how a human learns to play a game. With random gameplay situations that would normally be encountered when humans play are not present. To address this limitation, we analyse the effect of using intelligent versus random gameplay traces as well as the effect of varying the number of traces in the training set. We use Sancho, the 2014 GGP competition winner, to generate intelligent game traces for a large number of games. We then use the ILP systems, Metagol, Aleph and ILASP to induce game rules from the traces. We train and test the systems on combinations of intelligent and random data including a mixture of both. We also vary the volume of training data. Our results show that whilst some games were learned more effectively in some of the experiments than others no overall trend was statistically significant. The implications of this work are that varying the quality of training data as described in this paper has strong effects on the accuracy of the learned game rules; however one solution does not work for all games.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- (5 more...)